tg-me.com/CodeProgrammer/3769
Last Update:
๐ฃ๐ฟ๐ถ๐ป๐ฐ๐ถ๐ฝ๐ฎ๐น ๐๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐ ๐๐ป๐ฎ๐น๐๐๐ถ๐ (๐ฃ๐๐)
๐ง๐ต๐ฒ ๐๐ฟ๐ ๐ผ๐ณ ๐ฅ๐ฒ๐ฑ๐๐ฐ๐ถ๐ป๐ด ๐๐ถ๐บ๐ฒ๐ป๐๐ถ๐ผ๐ป๐ ๐ช๐ถ๐๐ต๐ผ๐๐ ๐๐ผ๐๐ถ๐ป๐ด ๐๐ป๐๐ถ๐ด๐ต๐๐
๐ช๐ต๐ฎ๐ ๐๐
๐ฎ๐ฐ๐๐น๐ ๐๐ ๐ฃ๐๐?
โคท ๐ฃ๐๐ is a ๐บ๐ฎ๐๐ต๐ฒ๐บ๐ฎ๐๐ถ๐ฐ๐ฎ๐น ๐๐ฒ๐ฐ๐ต๐ป๐ถ๐พ๐๐ฒ used to transform a ๐ต๐ถ๐ด๐ต-๐ฑ๐ถ๐บ๐ฒ๐ป๐๐ถ๐ผ๐ป๐ฎ๐น dataset into fewer dimensions, while retaining as much ๐๐ฎ๐ฟ๐ถ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ (๐ถ๐ป๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ถ๐ผ๐ป) as possible.
โคท Think of it as โ๐ฐ๐ผ๐บ๐ฝ๐ฟ๐ฒ๐๐๐ถ๐ป๐ดโ data, similar to how we reduce the size of an image without losing too much detail.
๐ช๐ต๐ ๐จ๐๐ฒ ๐ฃ๐๐ ๐ถ๐ป ๐ฌ๐ผ๐๐ฟ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐?
โคท ๐ฆ๐ถ๐บ๐ฝ๐น๐ถ๐ณ๐ your data for ๐ฒ๐ฎ๐๐ถ๐ฒ๐ฟ ๐ฎ๐ป๐ฎ๐น๐๐๐ถ๐ and ๐บ๐ผ๐ฑ๐ฒ๐น๐ถ๐ป๐ด
โคท ๐๐ป๐ต๐ฎ๐ป๐ฐ๐ฒ machine learning models by reducing ๐ฐ๐ผ๐บ๐ฝ๐๐๐ฎ๐๐ถ๐ผ๐ป๐ฎ๐น ๐ฐ๐ผ๐๐
โคท ๐ฉ๐ถ๐๐๐ฎ๐น๐ถ๐๐ฒ multi-dimensional data in 2๐ or 3๐ for insights
โคท ๐๐ถ๐น๐๐ฒ๐ฟ ๐ผ๐๐ ๐ป๐ผ๐ถ๐๐ฒ and uncover hidden patterns in your data
๐ง๐ต๐ฒ ๐ฃ๐ผ๐๐ฒ๐ฟ ๐ผ๐ณ ๐ฃ๐ฟ๐ถ๐ป๐ฐ๐ถ๐ฝ๐ฎ๐น ๐๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐๐
โคท The ๐ณ๐ถ๐ฟ๐๐ ๐ฝ๐ฟ๐ถ๐ป๐ฐ๐ถ๐ฝ๐ฎ๐น ๐ฐ๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐ is the direction in which the data varies the most.
โคท Each subsequent component represents the ๐ป๐ฒ๐
๐ ๐ต๐ถ๐ด๐ต๐ฒ๐๐ ๐ฟ๐ฎ๐๐ฒ of variance, but is ๐ผ๐ฟ๐๐ต๐ผ๐ด๐ผ๐ป๐ฎ๐น (๐๐ป๐ฐ๐ผ๐ฟ๐ฟ๐ฒ๐น๐ฎ๐๐ฒ๐ฑ) to the previous one.
โคท The challenge is selecting how many components to keep based on the ๐๐ฎ๐ฟ๐ถ๐ฎ๐ป๐ฐ๐ฒ they explain.
๐ฃ๐ฟ๐ฎ๐ฐ๐๐ถ๐ฐ๐ฎ๐น ๐๐
๐ฎ๐บ๐ฝ๐น๐ฒ
1: ๐๐๐๐๐ผ๐บ๐ฒ๐ฟ ๐ฆ๐ฒ๐ด๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป
Imagine youโre working on a project to ๐๐ฒ๐ด๐บ๐ฒ๐ป๐ customers for a marketing campaign, with data on spending habits, age, income, and location.
โคท Using ๐ฃ๐๐, you can reduce these four variables into just ๐๐๐ผ ๐ฝ๐ฟ๐ถ๐ป๐ฐ๐ถ๐ฝ๐ฎ๐น ๐ฐ๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐๐ that retain 90% of the variance.
โคท These two new components can then be used for ๐ธ-๐บ๐ฒ๐ฎ๐ป๐ clustering to identify distinct customer groups without dealing with the complexity of all the original variables.
๐ง๐ต๐ฒ ๐ฃ๐๐ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐ โ ๐ฆ๐๐ฒ๐ฝ-๐๐-๐ฆ๐๐ฒ๐ฝ
โคท ๐ฆ๐๐ฒ๐ฝ ๐ญ: ๐๐ฎ๐๐ฎ ๐ฆ๐๐ฎ๐ป๐ฑ๐ฎ๐ฟ๐ฑ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
Ensure your data is on the same scale (e.g., mean = 0, variance = 1).
โคท ๐ฆ๐๐ฒ๐ฝ ๐ฎ: ๐๐ผ๐๐ฎ๐ฟ๐ถ๐ฎ๐ป๐ฐ๐ฒ ๐ ๐ฎ๐๐ฟ๐ถ๐
Calculate how features are correlated.
โคท ๐ฆ๐๐ฒ๐ฝ ๐ฏ: ๐๐ถ๐ด๐ฒ๐ป ๐๐ฒ๐ฐ๐ผ๐บ๐ฝ๐ผ๐๐ถ๐๐ถ๐ผ๐ป
Compute the eigenvectors and eigenvalues to determine the principal components.
โคท ๐ฆ๐๐ฒ๐ฝ ๐ฐ: ๐ฆ๐ฒ๐น๐ฒ๐ฐ๐ ๐๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐๐
Choose the top-k components based on the explained variance ratio.
โคท ๐ฆ๐๐ฒ๐ฝ ๐ฑ: ๐๐ฎ๐๐ฎ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ถ๐ผ๐ป
Transform your data onto the new ๐ฃ๐๐ space with fewer dimensions.
๐ช๐ต๐ฒ๐ป ๐ก๐ผ๐ ๐๐ผ ๐จ๐๐ฒ ๐ฃ๐๐
โคท ๐ฃ๐๐ is not suitable when the dataset contains ๐ป๐ผ๐ป-๐น๐ถ๐ป๐ฒ๐ฎ๐ฟ ๐ฟ๐ฒ๐น๐ฎ๐๐ถ๐ผ๐ป๐๐ต๐ถ๐ฝ๐ or ๐ต๐ถ๐ด๐ต๐น๐ ๐๐ธ๐ฒ๐๐ฒ๐ฑ ๐ฑ๐ฎ๐๐ฎ.
โคท For non-linear data, consider ๐ง-๐ฆ๐ก๐ or ๐ฎ๐๐๐ผ๐ฒ๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ๐ instead.
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A